Cluster Ensemble and Its Applications in Gene Expression Analysis

نویسندگان

  • Xiaohua Hu
  • Illhoi Yoo
چکیده

Huge amount of gene expression data have been generated as a result of the human genomic project. Clustering has been used extensively in mining these gene expression data to find important genetic and biological information. Obtaining high quality clustering results is very challenging because of the inconsistency of the results of different clustering algorithms and noise in the gene expression data. Many clustering algorithms are available and different clustering algorithms may generate different clustering results due to their bias and assumptions. It is a challenging and daunting task for the genomic researchers to choose the best clustering algorithm and generate the best clustering results for their data sets. In this paper, we present a cluster ensemble framework for gene expression analysis to generate high quality and robust clustering results. In our framework, the clustering results of individual clustering algorithm are converted into a distance matrix, these distance matrices are combined and a weighted graph is constructed according to the combined matrix. Then a graph partitioning approach is used to cluster the graph to generate the final clusters. The experiment results indicate that cluster ensemble approach yields better clustering results than the single best clustering algorithm on both synthetic data set and yeast gene expression data set.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

LCE: a link-based cluster ensemble method for improved gene expression data analysis

MOTIVATION It is far from trivial to select the most effective clustering method and its parameterization, for a particular set of gene expression data, because there are a very large number of possibilities. Although many researchers still prefer to use hierarchical clustering in one form or another, this is often sub-optimal. Cluster ensemble research solves this problem by automatically comb...

متن کامل

A Link-Based Cluster Ensemble Approach for Improved Gene Expression Data Analysis

It is difficult from possibilities to select a most suitable effective way of clustering algorithm and its dataset, for a defined set of gene expression data, because we have a huge number of ways and huge number of gene expressions. At present many researchers are preferring to use hierarchical clustering in different forms, this is no more totally optimal. Cluster ensemble research can solve ...

متن کامل

Gene Expression of CD226 and Its Serum Levels in Patients With Multiple Sclerosis

Background: Recent studies have found some genetic variants as a risk factor for autoimmune diseases such as Multiple Sclerosis (MS). Cluster of Differentiation 226 (CD226) is one of the risk factors for MS.  Objectives: The present study aimed to evaluate the gene expression of CD226, and its protein serum level in peripheral blood samples of MS patients and healthy individuals. Materials & ...

متن کامل

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

A new ensemble clustering method based on fuzzy cmeans clustering while maintaining diversity in ensemble

An ensemble clustering has been considered as one of the research approaches in data mining, pattern recognition, machine learning and artificial intelligence over the last decade. In clustering, the combination first produces several bases clustering, and then, for their aggregation, a function is used to create a final cluster that is as similar as possible to all the cluster bundles. The inp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004